Quantum circuits of T-depth one 
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We give a Clifford+T representation of the Toffoli gate of T-depth 1, using four ancillas. More generally, 
we describe a class of circuits whose T-depth can be reduced to 1 by using sufficiently many ancillas. We 
show that the cost of adding an additional control to any controlled gate is at most 8 additional T-gates, 
and T-depth 2. We also show that the circuit THT does not possess a T-depth 1 representation with an 
arbitrary number of ancillas initialized to |0). 



1 Introduction 

It is known that the gates of the Clifford group, together 
with the single-qubit non-Clifford gate 
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form a good universal gate set for fault-tolerant quantum 
computation [2J. The decomposition of arbitrary gates 
into this Clifford+T set, either exactly or to within some 
given accuracy e, is an important problem J3J. It is often 
desirable to find decompositions that are optimal with 
respect to a given cost function. The exact cost function 
used is application dependent; some possibilities are: the 
total number of gates; the total number of T-gates; the 
circuit depth; and/or the number of ancillas used. 

Amy et al. [1] recently proposed T-depth as a cost 
function. The idea is to count the number of T -stages in 
a circuit, rather than the number of T-gates. A T-stage 
is a group of one or more T- and/or T^ -gates on distinct 
qubits that can be performed simultaneously. Note that, 
for the purpose of computing T-count or T-depth, the 
gates T and can be treated interchangeably, due to 
the identity T* = TSt 

To illustrate the concept of T-depth, consider the 
standard decomposition of the Toffoli gate into the 
Clifford+T set, as given in [4]: 







-€ 


h- 



(1) 



This decomposition has T-count 7, and in the exact form 
written, it has T-depth 6, because the fourth and fifth T- 
gates form a single T-stage. Using trivial commutations, 
the circuit (JTJ) can easily be reduced to T-depth 4: 
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Amy et al. pQ further improved the T-depth of the Toffoli 
gate to 3, using the following circuit. They conjecture 



that for circuits without ancillas, this T-depth is optimal. 
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The purpose of this note is to show that, with the 
use of ancillas, the T-depth of the Toffoli gate, and of 
many (but not all) other circuits, can be reduced to 1. 
This may be useful in quantum computing architectures 
where T-gates are expensive and ancillas are cheap. 

2 A T-depth one representation 
of the Toffoli gate 

Recall that the Clifford group for any number of qubits 
is generated by the Hadamard gate H, the phase gate 
S = T 2 , the controlled not-gate, and unit scalars. As 
usual, we write X, Y , and Z for the Pauli operators. 
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The Toffoli gate is a doubly-controlled not-gate. It 
is equivalent to a doubly-controlled Z-g&te via a basis 
change: 

' - (4) 
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Now consider a computational basis state \xyz), where 
x,y,z € {0,1}. The effect of the doubly-controlled Z- 
gate is to map \xyz) to (— l) xyz \xyz) . Let us write "©" 
for modulo-2 addition in {0,1}, and "+" and "— " for 
the usual addition and subtraction of integers. We then 
have the following inclusion-exclusion style formula for 
x,y,z £ {0, 1}: 

Axyz = x+y+z-(x®y)-(y®z)-(xQ)z) + (x®y®z). (5) 

This is easy to prove by case distinction, or algebraically 
using x®y = x + y — 2xy. Now let to = (— l) 1 / 4 = e 1 ™/ 4 . 
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Figure 1: T-depth 1 representation of the Toffoli gate 



From ([5]), we have 



uj x uj v oj z (^) x(By (uj^) yS>z (u*) x<Bz UJ X ®V® Z . 



(6) 



Note that T\x) — u> x \x), and therefore, the doubly- 
controlled Z-gate can be implemented by applying le- 
gates to qubits in states \x), \y), \z), and \x®y(Bz), and 
T'-gates to qubits in states \x © y), \y © z), and \x © z). 
This can be done in any order, or even in parallel, us- 
ing four ancillas, as shown in Figure [TJ Combining this 
with ((4J, we obtain a representation of the Toffoli gate 
of T-depth 1 and overall depth 7. 

Remark 2.1. It is interesting to note that the decom- 
positions of Nielsen and Chuang ([T]) and Amy et al. ([3]) 
follow precisely the same pattern, i.e., they can both be 
seen to be direct implementations of ((6]). The only dif- 
ference is that in each of the circuits, one of the T-gates 
has been needlessly decomposed into T^ and S. 

3 An application to multiply-con- 
trolled gates 

Consider the following gate: 
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The double-controlled Z-g&te is a diagonal gate whose ef- 
fect is given by The controlled S^-gate is a diagonal 
gate whose effect is given by {~i) xy = (uj* ) y u> x ® v . 

It follows that the combined effect of the two gates is 

(_!)«»« (_»)«* = uj z {^) y ® z {u f ) x<Sz u> x ® v ® z , (8) 

which can therefore be implemented with T-count 4. Us- 
ing one ancilla, this can be done with T-depth 1 and 
overall depth 5: 



Alternatively, one can find an implementation that uses 
no ancilla. It uses fewer overall gates, but has T-depth 2 
and over overall depth 7: 
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Let us write 
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and we use the mirror image notation to denote the 
inverse of this gate. Suppose we have a Clifford+T- 
representation of some controlled quantum gate G, and 
we wish to obtain an efficient Clifford-)- T-representation 
of a doubly-controlled G-gate. Using ©, (TTJ, and ([12]). 
the cost of doing so is at most 8 additional T-gates, in- 
creasing the T-depth by at most 2, and the overall depth 
by at most 14, using 2 ancillas: 
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Note that the cost of the additional control, in terms of 
the overall gate count, is 28 (2 times 12 gates from (|9|) 
and 2 times 2 Hadamard gates from (fTTj)). This can be 
reduced to 26 by leaving the ancilla in (0) in state \x) 
instead of |0); however, doing so requires carrying this 
ancilla during the computation of G, which may involve 
a tradeoff. 

If (fT0|) is used instead of (jU, the overall gate count cost 
of (fT2|) decreases to 22, and the ancilla use to 1. How- 
ever, the depth and T-depth cost increase to 18 and 4, 
respectively. 
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|x) Remark 3.1. The above construction can be iterated to 
\y) add n additional controls to a controlled gate at the cost 
z) of T-count 8n and T-depth 2 [log 2 n + lj . The logarithm 
in the expression for T-depth arises because a pair of T- 
(9) stages are sufficient to double the number of controls, as 
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shown here for n = 3: 
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For example, this yields an implementation of a triply- 
controlled not- gate with T-count 15 and T-depth 3 (7 le- 
gates for the Toffoli gate, and 8 T-gates for the additional 
control) ; or a quintuply-controlled not-gate with T-count 
31 and T-depth 5. It is not currently known whether any 
of these T-counts or depths are optimal. 

Remark 3.2. Because the T-gate is diagonal with 
T|0) = |0), it can be regarded as a controlled gate, 
namely, a controlled global phase change. Therefore, we 
can use the above procedure to implement a controlled 
T-gate with T-count 9 as follows: 
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Using ([9]), we obtain T-depth 3, depth 15, and gate 
count 29 with two ancillas. As before, by leaving the an- 
cilla of (|9]) in state \x) instead of |0), the gate count can 
be reduced to 27. Alternatively, using (fT0|). we obtain 
T-depth 5, depth 19, and gate count 27 with one ancilla. 
Except for slightly improved overall gate counts, these 
results are the same as those in pQ. 

4 T-depth one representation of 
almost classical circuits 

It is straightforward to generalize the construction of 
Section [5] to circuits built up from T and almost classical 
gates. 

Definition 4.1. A unitary operator is classical if it is 
given by a permutation of computational basis states, 
and diagonal if its matrix representation is diagonal in 
the computational basis. Let us call an operator almost 
classical if it can be written as a product of a classical 
operator and a diagonal operator. 

The almost classical operators obviously form a group. 
Of the 24 single-qubit Clifford operators (taken modulo 
global phase), exactly 8 are almost classical; they form 
the subgroup generated by S and X. 

Definition 4.2. Let C be a set of gates. We way that a 
circuit is C + T-representable if it can be built with gates 
from C U {T} and their inverses. We say that such a 
circuit has T-depth n (relative to C) if it can be written 
using only gates from C and n T-stages. 



Theorem 4.3. Let C be any set of almost classical gates, 
containing the controlled not-gate. Using ancillas, any 
C + T -representable n-qubit circuit can be written ofT- 
depth 1 (relative toC). 

Proof. The proof idea is simple. Each T-gate in the cir- 
cuit is a 7r/4 phase change conditioned on some boolean 
combination of the inputs. Intuitively, one may copy 
each such boolean condition to an ancilla, execute all 
T-gates in parallel, uncompute the ancillas, and finally 
re-compute the output. 

The formal proof proceeds by induction on circuits. 
For each C + T-representable n-qubit circuit A, we will 
by induction construct C + T-representable circuits A\ 
and A 2 such that A\ is diagonal and has T-depth at most 
1, A 2 has T-depth 0, and A = A 2 o A\. 

The base case occurs when A = I is the identity cir- 
cuit. In this case, we can let A\ = A 2 = I, and there is 
nothing to show. 

For the induction step, suppose A is of the form A'oG, 
where G is a single gate. By induction hypothesis, there 
is a decomposition A' — A' 2 o A[ satisfying the above 
conditions. 

• Case 1: G is not equal to T or T*. In this case, we 
let Ax =G^oA' l oG and A 2 = A' 2 oG. Then trivially, 
A = A 2 o A 1: and Ax and A 2 have the required T- 
depths. Moreover, since G is almost classical, Ax is 
diagonal. 

• Case 2: G is T, applied to the ith qubit. In this 
case, we let 
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and A 2 — A' 2 . Since A[ is diagonal, so is Ax, and 
it follows that the ancilla is uncomputed correctly. 
Moreover, Ax is equivalent to A[ o G, and therefore, 
A = A 2 o Ax. Finally, since A[ has T-depth at most 
1, so does Ax. 

Case 3: G is T\ applied to the ith qubit. This is 
entirely analogous to case 2. □ 



Remark 4.4. The gate set C in Theorem 14.31 is not 
necessarily assumed to consist of Clifford gates. For ex- 
ample, if on some hypothetical architecture, T-gates are 
expensive but Toffoli gates are cheap, one can include 
the Toffoli gate in the set C. 



Remark 4.5. In general, the proof of Theorem [473] in- 
creases the size of the circuit, but only by a constant 
factor. In practice, it is often possible to find a much 
smaller circuit than the one constructed in the proof. 
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Remark 4.6. Taking C = {S,X, CNOT} and applying 
Theorem l4.3l to circuit ([lj (excluding the initial and final 
Hadamard gate) yields another T-depth one representa- 
tion of the Toffoli gate. 

Remark 4.7. There is a trade-off between T-depth and 
the number of ancillas. The procedure of the proof of 
Theorem 14 . 31 adds one ancilla for each T-gate. However, 
by splitting a circuit with T-count n into two circuits 
with T-count [n/2] each, it is clear that one can ap- 
proximately half the number of ancillas by doubling the 
T-depth, and so forth. 

Remark 4.8. Version 2 of [I], which appeared follow- 
ing the private communication credited as [24] therein, 
contains a similar result in Section 6.4, but with a proof 
that is quite different. 

5 Some circuits cannot be writ- 
ten with T-depth one 

The result of the previous section shows that any two T- 
stages can be combined into a single T-stage, provided 
that they are only separated by almost classical gates. 
One may wonder whether perhaps all Clifford+T circuits 
can be written of T-depth one, using a sufficient number 
of ancillas initialized to |0). We show that this cannot 
be done. 

Theorem 5.1. The single- qubit operator THT cannot 
be implemented as a Clifford+T circuit of T-depth 1, 
using an arbitrary number of ancillas initialized to |0). 
This is true even if the ancillas are not required to be 
returned to their initial state at the end of the computa- 
tion. 

Before proving the theorem, we start with a general 
observation about Clifford+T circuits of T-depth 1 . 

Proposition 5.2. Let U be an n-qubit Clifford+T cir- 
cuit of T-depth 1. Let |</>) be any single-qubit state, and 
consider 

\$) = U{\4>) <g> |0) ® . . . <g> |0». 

Consider the {+1, — 1 }-valued Pauli observable X ap- 
plied to the first qubit of tp; denote its expected value 
by Eu\. Suppose E\+\ is non-zero. Then 



E 
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is a rational number. 



Proof. The expected value of the observable X on the 
first qubit of is 



E w = (ip\ (X <g> I <g> . . . ® I) \tp) 
= (0,0,..., 0| U*(X®I® 



I)U\cf>,0,...,0}. 

(16) 



We analyze the structure of U' (X ® / (g> . . . ® I)U. Since 
U is of T-dcpth 1, it can be written as U = U3 o £/ 2 o Ui, 
where U\ and U3 are Clifford circuits and U2 — T <Ei 
...®T®I®...®I. Since Ui is Clifford, we know that 
U\{X ® / <E> ■ ■ ■ <£> I)U\ is a Pauli operator 

U\{X®L®...® I)Ui = ±A 1 ® . . . (g) A n , (f 7) 

where each Ai £ {X, Y, Z, I}. Using the relations 

T f TT = I, T^ZT = Z, 

T^XT = - -^=Y. T^YT = ^=X + -iy. 

we find that 
U\{±A X <g> . . . <g> A n )U 2 

= ±(TUiT) ® . . . ® {T^A ni T) <g> A ni+1 <g> . . . <g> A n 
= APi + XP 2 + . . . + XP m , 

(18) 

where each Pj is an n-qubit Pauli operator. The key ob- 
servation here is that the same factor A occurs in front 
of each (possibly signed) summand, and A is indepen- 
dent of |</>). In fact, we have A = (^) k , where k is the 
number of times the operators X and Y occur among 
Ai,...,A ni . Let 

Qi = Ul Pj U 3 . (19) 

Since C/ 3 is Clifford, this is again some Pauli operator, 
say 



Qj = (-1)*-Bj,i® ...®s i)n . 

Combining pT|) through (|2TH) . we find 
Xf{X ® I ® ... <g> T)U = XQi + AQ 2 + • • • + XQr 



(20) 



3=1 

(21) 

Combining this with (|16]h we get 

m 

E \*) =A^(-l)*^|%i|0) (0|% 2 |0) ■•• (0|% n |0). 
i=i 

(22) 

Since each Bji £ {X, Y, Z, 1} is a Pauli operator, it fol- 
lows that E\^ /A is rational (indeed, an integer) for £ 
{|0), |+)}. The claim then immediately follows. □ 



Proof of Theorem \5.1\ For U = THT, we compute 



and therefore 



E {0) = (0| tfXU |0) 



and 



E \+) - 



U^XU 
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Since E\q\ /E\+\ is irrational, the claim immediately fol- 
lows from Proposition [52] □ 
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